Goto

Collaborating Authors

 heterogeneous graph learning


Heterogeneous Graph Learning for Visual Commonsense Reasoning

Neural Information Processing Systems

Visual commonsense reasoning task aims at leading the research field into solving cognition-level reasoning with the ability to predict correct answers and meanwhile providing convincing reasoning paths, resulting in three sub-tasks i.e., Q-> A, QA-> R and Q-> AR. It poses great challenges over the proper semantic alignment between vision and linguistic domains and knowledge reasoning to generate persuasive reasoning paths. Existing works either resort to a powerful end-to-end network that cannot produce interpretable reasoning paths or solely explore intra-relationship of visual objects (homogeneous graph) while ignoring the cross-domain semantic alignment among visual concepts and linguistic words. In this paper, we propose a new Heterogeneous Graph Learning (HGL) framework for seamlessly integrating the intra-graph and inter-graph reasoning in order to bridge the vision and language domain. Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement. Moreover, our HGL integrates a contextual voting module to exploit a long-range visual context for better global reasoning. Experiments on the large-scale Visual Commonsense Reasoning benchmark demonstrate the superior performance of our proposed modules on three tasks (improving 5% accuracy on Q-> A, 3.5% on QA-> R, 5.8% on Q-> AR).



Reviews: Heterogeneous Graph Learning for Visual Commonsense Reasoning

Neural Information Processing Systems

Originality: The VCR task is a novel task (proposed by Zellers et al, CVPR19). The proposed HGL framework for this interesting task is novel and interesting. The paper applies the HGL framework on top of the baseline model (R2C from Zellers et al., CVPR19) and shows significant improvements. The paper compares other existing graph learning approaches. The main difference between the proposed approach and other graph learning approaches is the heterogeneous nature (across domains – vision and language) of the graph learning framework. Quality: The paper does a good job of evaluating the propsed approach and its ablations.


Reviews: Heterogeneous Graph Learning for Visual Commonsense Reasoning

Neural Information Processing Systems

After considering the author response and discussing this submission, all reviewers recommend acceptance -- including two high ratings. The reviewers generally found the approach novel but were interested in how it applies outside of the VCR task to other question answering datasets. With the addition of the experiments from the rebuttal, this is a strong submission.


Heterogeneous Graph Learning for Visual Commonsense Reasoning

Neural Information Processing Systems

Visual commonsense reasoning task aims at leading the research field into solving cognition-level reasoning with the ability to predict correct answers and meanwhile providing convincing reasoning paths, resulting in three sub-tasks i.e., Q- A, QA- R and Q- AR. It poses great challenges over the proper semantic alignment between vision and linguistic domains and knowledge reasoning to generate persuasive reasoning paths. Existing works either resort to a powerful end-to-end network that cannot produce interpretable reasoning paths or solely explore intra-relationship of visual objects (homogeneous graph) while ignoring the cross-domain semantic alignment among visual concepts and linguistic words. In this paper, we propose a new Heterogeneous Graph Learning (HGL) framework for seamlessly integrating the intra-graph and inter-graph reasoning in order to bridge the vision and language domain. Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.


Heterogeneous Graph Learning for Explainable Recommendation over Academic Networks

Chen, Xiangtai, Tang, Tao, Ren, Jing, Lee, Ivan, Chen, Honglong, Xia, Feng

arXiv.org Artificial Intelligence

With the explosive growth of new graduates with research degrees every year, unprecedented challenges arise for early-career researchers to find a job at a suitable institution. This study aims to understand the behavior of academic job transition and hence recommend suitable institutions for PhD graduates. Specifically, we design a deep learning model to predict the career move of early-career researchers and provide suggestions. The design is built on top of scholarly/academic networks, which contains abundant information about scientific collaboration among scholars and institutions. We construct a heterogeneous scholarly network to facilitate the exploring of the behavior of career moves and the recommendation of institutions for scholars. We devise an unsupervised learning model called HAI (Heterogeneous graph Attention InfoMax) which aggregates attention mechanism and mutual information for institution recommendation. Moreover, we propose scholar attention and meta-path attention to discover the hidden relationships between several meta-paths. With these mechanisms, HAI provides ordered recommendations with explainability. We evaluate HAI upon a real-world dataset against baseline methods. Experimental results verify the effectiveness and efficiency of our approach.


Heterogeneous Graph Learning for Visual Commonsense Reasoning

Yu, Weijiang, Zhou, Jingwen, Yu, Weihao, Liang, Xiaodan, Xiao, Nong

Neural Information Processing Systems

Visual commonsense reasoning task aims at leading the research field into solving cognition-level reasoning with the ability to predict correct answers and meanwhile providing convincing reasoning paths, resulting in three sub-tasks i.e., Q- A, QA- R and Q- AR. It poses great challenges over the proper semantic alignment between vision and linguistic domains and knowledge reasoning to generate persuasive reasoning paths. Existing works either resort to a powerful end-to-end network that cannot produce interpretable reasoning paths or solely explore intra-relationship of visual objects (homogeneous graph) while ignoring the cross-domain semantic alignment among visual concepts and linguistic words. In this paper, we propose a new Heterogeneous Graph Learning (HGL) framework for seamlessly integrating the intra-graph and inter-graph reasoning in order to bridge the vision and language domain. Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.